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Abstract 

The purpose of this study is to examine a neural network based approach to predict achievement in graduate education 
for Elementary Mathematics prospective teachers. With the help of this study, it can be possible to make an effective 
prediction regarding the students’ achievement in graduate education with Artificial Neural Networks (ANN). Two 
different neural networks are used for an effective prediction of the first network in which some core courses are taken 
by prospective Mathematics teachers in their first two years, including General Mathematics, Pure Mathematics, 
Analysis I, Analysis II, Geometry, Linear Algebra-1. The scores received from the above courses are used as an input for 
the back-propagation neural network (BPNN). Additionally, the scores of vocational core courses taken by third year 
students, including Analysis3, Special Teaching Methods 2, Elementary Number Theory, Algebra, Problem Solving, are 
used as the output of the BPNN. The second network uses the scores of all courses which are previously mentioned and 
uses them as an input of BPNN and also uses ALES (Academic Personnel and Postgraduate Education Entrance Exam) 
score. They are used as an output of the BPNN. Besides, the correlation analysis is conducted by using the average 
graduation and ALES scores components. Analytical results demonstrate that the BPNN model offers relatively accurate 
predictions for the student success in graduate education with a high average of accuracy (Neural Networkl is 77.125% 
and Neural Network2 is 68.5%). Another finding is that there is no significant correlation between the graduate average 
scores of candidates who qualified for a graduate education and their ALES scores. These results indicate that BPNN is 
a suitable tool to predict the academic success of all education majors. Student career advisors can use the ANN model 
to identify the students who have particular potential for graduate education, and this prediction model can help these 
students adjust their own teaching strategies, and provide guidance and support for their careers. 

Keywords: artificial neural network, back-propagation neural network, mathematics teacher education, post-graduation 
education, predict 

1. Introduction 

Predicting students’ academic performance is critical for educational institutions, because it allows them to develop 
strategic programs that will help to improve students’ performances during their period of study in an institution. 
Graduate education is becoming increasingly important in the academic teaching industry, as well as, in other sectors. 
Higher education access has an increasing potential because of individual, institutional, social, economic and some 
other factors (Tural 2004; Ekinci 2009) especially in the last 20-25 years, the increasing dissemination and participation 
in higher education has become one of the increasing political purposes of the country (Ekinci 2011). 

In the new millennium, one of the core issues in educational reform is the development of a highly qualified teaching 
force that will be prepared to meet the demands of the paradigm shifts in school education (Fullan 1998). In such an era 
of fast transformations, teachers, as key agents in school education are expected to face new expectations, challenges 
and uncertainties in the discharge of their professional duties of preparing their young people for success in the new 
century (McGhan 2002; Wheatley 2002). International research has identified clearly the impact of teachers on the 
learning of young people (Hattie 2002). The question is often raised about how adequately teacher education has 
empowered teachers to take up the new roles and perform teaching effectively. 

Teacher education provided by education faculties in Turkey is carried out by similar institutions in different countries; 
Arslan & Ozpinar (2008), after viewing recommended source books and interviewing with prospective teachers, stated 
that future Turkish teachers have been educated to have essential fundamental skills and knowledge. Although 


113 



Journal of Education and Training Studies 


Vol. 4, No. 5; May 2016 


prospective teachers are given sufficient pre-service education, certain studies show that teachers who are currently 
serving have difficulties in many different issues. For example; applying new primary school curriculum (Dindar & 
Yangin, 2007); integrating ICT in instruction (Goktas, Yildirim & Yildirim, 2008; Usluel, Mumcu & Demiraslan, 2007; 
Umay, 2004); reflective teaching skills (Duban & Kucukyilmaz, 2008); assessment (Orbeyi & Guven, 2008, Duban & 
Kucukyilmaz, 2008); teaching and learning processes (Kincal, Ergul & Timur, 2007). 

All the above show that teachers still have certain weaknesses after the pre-service education they received. On the 
other hand, even experienced teachers face unexpected, extreme situations and problems that need to be overcome. At 
the same time, in this age of fast technological and social advancements, student needs are diversifying continuously 
along with the fast propagating knowledge (Alan and Kara, 2010). 

In this case, graduate education is becoming more important. Graduate education provides teachers opportunities of 
gaining new knowledge and skills by influencing student learning through practicing these knowledge and skills and 
contributing school development by affecting other teachers. Teachers are supposed to have these knowledge and skills 
at the end of faculty (Day, 1999). The basic skills of mathematics teachers which are related with their personalities and 
professions, teaching-learning process and assessment are acquired or improved in graduate education (Alan and Kara; 
2010 ). 

Prospective Mathematics teachers prefer graduate education, because they want to teach effectively, continue their 
academic careers or find a more qualified job in private schools. The approval for graduate education is limited, but 
students’ demands are increasing. Since there are limited chances for being a public teacher, prospective teachers try to 
have a graduate education to become more qualified for other choices. 

Not all the students enter the graduate programs in Turkey because of the undergraduate and graduate education policy. 
ALES (Academic Personnel and Postgraduate Education Entrance Exam) score and the average graduation score are 
most important components to qualify for graduate education in Turkey. ALES is an exam used to indicate the level of 
basic logic and reasoning skills. Therefore, majority of prospective mathematics teachers can get high scores from this 
examination. 

When we analyze the studies focusing teachers’ graduate education, it is seen that they have been clustered around 
certain issues: problems faced by teachers having graduate degrees and suggestions to solve them (Kuzu & Becit, 2007; 
Guven & Tung, 2007; Aslan, 2007; Bakioglu & Giirdal, 2001), expectations of graduate students and actualization level 
of these expectations (Demir, 2007) efficiency of the graduate education obtained (Bumen, 2006). In an exceptional 
study, Alhas (2006), with a questionnaire, tried to determine the opinions of teachers having graduate degrees about 
skills they acquired and functions of graduate education. 

The purpose of this research is to be able to make an effective prediction regarding the students’ success in graduate 
education with Artificial Neural Networks (ANN) which is used as an effective prediction method in various sectors, as 
an alternative to traditional methods in the field of education. It was observed that there have been no studies, at least in 
Turkey, in order to make an effective prediction regarding the students’ success in graduate education with Artificial 
Neural Networks (ANN). 

Many of the prior studies involving graduate student performance have used linear regression models to estimate 
student academic performance based on their pre-admission record of achievements. Gayle and Jones (1973) and Baird 
(1975) found a significant positive relationship between Graduate Records Examination (GRE) scores and graduate 
grade point average (GPA) in graduate students. Studies by Lee (2010) used a neural network based approach for 
predicting the learning effect in design students with an average accuracy of 93.54%. Studies by Jun (2005) and Herrera 
(2006) provide a comprehensive overview of the theoretical models describing student persistence and dropout in both 
contact and distance education institutions. Deckro and Woundenberg (1972) studied nine variables as possible 
predictors of academic success among Kent State MBA students. 

Paolillo (1982) employed step-wise regression in his study and found that the applicant's junior and senior 
undergraduate grade point average was the first variable to enter into the equation. Schwan (1988) found GPA to be 
significantly correlated with GMAT score, undergraduate grade point average, and junior/senior grade point average 
among Murray State University MBA students. This research is to be able to make an effective prediction regarding the 
students’ success in graduate education with Artificial Neural Networks (ANN). Two different neural networks are used 
to effectively estimate the first network takes the grades of all courses taken by first-year Mathematics teacher 
candidates, including General Mathematics, Pure Mathematics, Analysis I, Analysis II, Geometry, Linear Algebra-I, and 
uses these grades as the input of the back-propagation neural network (BPNN). Additionally, the grades of professional 
core courses at the upperclassman level, including Analysis3, Special Teaching Methods 2, Elementary Number Theory, 
Algebra, Problem Solving, are used as the output of the BPNN. The second network takes the grades of all courses 
taken by all courses mentioned above and uses these grades as the input of BPNN and the grades of ALES (Academic 
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Personnel and Postgraduate Education Entrance Exam) score, is used as the output of the BPNN. Additionally, the 
correlation analysis which was made with the average graduation and ALES score components. 

“A Neural Network is basically a simplified model of the way the human brain processes information. It works by 
simulating a large number of interconnected simple processing units that resemble abstract versions of neurons. The 
processing units are arranged in layers. There are typically three parts in a neural network: an input layer, with units 
representing the input fields; one or more hidden layers; and an output layer, with a unit or units representing the output 
field(s). The units are connected with varying connection strengths (or weights). Input data are presented to the first 
layer, and values are propagated from one neuron to the other neuron on the next layer. Eventually, a result is delivered 
from the output layer. The network learns by examining individual records, generating a prediction for each record, and 
making adjustments to the weights whenever it makes an incorrect prediction. This process is repeated many times, and 
the network continues to improve its predictions until one or more of the stopping criteria have been met. Initially, all 
weights are random, and the answers that come out of the net are probably nonsensical. The network learns through 
training. Examples for which the output is known are repeatedly presented to the network, and the answers it gives are 
compared to the known outcomes. Information from this comparison is passed back through the network, gradually 
changing the weights. As training progresses, the network becomes increasingly accurate in replicating the known 
outcomes. Once trained, the network can be applied to future cases where the outcome is unknown” (SPSS, 2006). 

“Several types of neural networks exist. Among them, the feed-forward neural networks are the most popular 
architectures due to their structural flexibility, good representational capabilities, and availability of a large number of 
training algorithms (Haykin, 1999). This Network comprises neurons arranged in layers in which every neuron is 
connected to all neurons of the next layer (a fully connected network). Multilayer perception neural networks (MLPs) 
are a type of feed-forward network consisting of an input layer of nodes followed by two or more layers of neurons with 
the last layer being the output layer. The input layer is first layer and it accepts symptoms, signs, and experimental data. 
The layers between the input and output layers are referred to as hidden layers. Outputs of neurons in one layer are 
inputs for the next layer. There are no connections between non-adjacent layers and no connections between neurons in 
the same layer. Connections between layers go in only one direction, i.e. there are no feedbacks. The back-propagation 
algorithm is most widely used to adjust the network parameters which were established by Rumelhart, Hinton, and 
Williams (1986). According to this algorithm, information is passed forward from the input nodes through the hidden 
layers to the output nodes and the error between the desired response and the actual response of the network is 
computed. This error signal is then propagated backwards to the input neurons adjusting the network weights and 
biases. 

This process is repeated for each sample in the training set. As soon as the entire training set has been presented to the 
network, an epoch has elapsed. The training phase may comprise several epochs (Lykourentzou, Giannoukos, Mpardis, 
Nikolopoulos, & Loumos, 2009). A popular approach to optimizing the performance of back-propagation is the 
Levenberg-Marquardt algorithm (Marquardt, 1963) which has been exposed to increase the speed of convergence and 
effectiveness of the moderate-sized network training (Hagan, Demuth, & Beale, 1996; Hagan & Menhaj, 1994). During 
the training phase, a network may end up memorizing the training data and thus lose its ability to generalize from the 
training samples to an unseen population. This phenomenon is called over-fitting and can be avoided by employing a 
separate data set called the validation set. The network parameters are estimated based only on the training set and the 
performance of the network is assessed by computing the MSE on the validation set. When the network performance 
deteriorates, it usually means that over-fitting has occurred. Training then stops and the parameters of the best 
previously trained network are stored. The training phase can be terminated by reaching a minimum in the cost function, 
by meeting the performance goal, or by detecting that the validation set produced increasing MSE. Finally, after the 
training is finished, the network test phase occurs. During this phase, unseen data are presented to the trained network to 
appraise its performance. These data comprise the test set that is disjoint to both the training and the validation data sets” 
(Kardan et al, 2013:4). 

2. Method 

This study aims to analyze the usage of the BPNN model as a decision-making tool to predict learning effect for 
students of Elementary Mathematics Teaching students. During the process of making predictions regarding the 
academic achievements of prospective teachers; students’ scores from specified courses during graduate education and 
ALES scores were taken into consideration and it was tried to predict the achievements of students by using all these 
data as predictive variables. Additionally, the correlation analysis was made between academic graduation average point 
(GPA) and ALES scores components. 

2.1 Collecting Data 

The study collected the grade information of students who graduated from the Department of Mathematics Education. 
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The information comprised not only the first-year grades of the students in all courses, including General Mathematics, 
Pure Mathematics, Analysis I, Analysis II, Geometry, Linear Algebra-I, but also the professional core course grades at 
the next grade level, including Analysis3, Special Teaching Methods 2, Elementary Number Theory, Algebra, Problem 
Solving. 

Grade information was recorded for 220 students in total. Afterwards, this information was employed for the training 
and testing stages of the BPNN. In order to assess the ability of the BPNN model to predict learning effect in students of 
Elementary School Mathematics teaching, 176 data sets (80% of the total grades information) were randomly selected 
from the 220 data sets of the total grade information used for BPNN model building; i.e. the training samples. The 
remaining 44 data sets (20% of the total grades information) then were used to test the prediction accuracy of the BPNN 
model; i.e. the testing samples. 

Table 1. The Data Set of the Implementation 


Data Set 

3 Different Universities Educational Sciences, 

Input Years 

Number of Data 

Total 

No 

Teaching Elementary School Mathematics 




1 

who are getting/got a master’s degree between 

2006 

4 



2006-2010 (only Marmara University) 

2007 

6 




2008 

5 




2009 

11 

80 



2010 

13 



2011 

15 




2012 

14 




2013 

12 


2 

who graduated from graduate education in years 
2010-2011 

2008-2011 

140 

140 


(3 Different Universities) 




3 

who entered a faculty in 2010 and continue his/her 
graduate education (3 Different Universities) 

2010-2014 

152 

152 


Data were collected from 3 different universities (Marmara University, Yddiz Technical University, Necmettin Erbakan 
University). The data used in this study were taken from Marmara and Necmettin Erbakan Universities for the training 
of the network because Yddiz Technical University graduate education program is not ready yet. 

In the data set, the data are separated randomly into two sections as training and test sets. Training set is mainly used to 
learn which is suitable to the significance of the data. Test set is completely used for evaluating the performance of a 
specific classifier. The training set is used for the training of the networks and the test set is used for assessing the 
performance of training in the implementation. 80 % of the data set is composed of training set and 20 % of it is 
composed of test set. Besides, before this analysis, the number of neurons at the secret layer is determined by looking at 
the significance of a classifier which is called as validity set. 

In order to evaluate the number of secret layers in a problem, the performance of the validity data is analyzed. In order 
to test the performance of the network structure, Mean Square Error (MSE) and Mean Absolute Error are employed. 

For the network, 1000 iterations were carried out. As a result of the neural network analysis, separate classification 
tables were obtained for each set. The accuracy percentages obtained from each set are different. While calculating the 
general accuracy percentage, the results obtained from three sets should be combined. 

The accurate classification percentage of the training set was found for each event. Absolute error mean and mean 
square error were found. Training set classification table was created for implementation that will be predicted and the 
accurate classification percentage was found for test set. In order to obtain classification table according to ANN 
implementation, training, validity and test sets were combined. The assigning values are added while combination is 
being carried out. The obtained data were given in tables at the end of the section regarding implementation. 

3. Method of Analysis and Results 

Two different network structures were used for the study. ALES score and graduation point are two important 
requirements to be admitted for graduate education. BPNN was employed in order to recognize the links between 
graduate score and ALES score for predicting prospective teachers’ entering graduate education. In addition, the 
correlation analysis was made between academic graduation average point (GPA) and ALES scores components. 
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Uj 



Hidden Layer 


Figure 1. Architecture of three-layer BPNN in this study (Neural network 1: 6 Input, 5 Output Layered Network 

Model.) 

The networkl shown in Figure 1 is a feed-forward neural network which consists of three layers. The input layer has a 
total of 6 nodes. Each node represents first-year courses of a graduate education. The output layer has 5 nodes which 
represent the upperclassman courses a graduate education. The network2 shown in Figure 2 is a feed-forward neural 
network which consists of three layers. The input layer has a total of 11 nodes. Each node represents a graduate courses. 
The output layer has one node which represents the ALES score. 

Input layer: 

The input layer, including General Mathematics, Pure Mathematics, Analysis I, Analysis II, Geometry, Linear Algebra-I, 
are taken as input variables (i.e. input nodes) in the input layer of the BPNN. Therefore, the input layer contains a total 
of six nodes. 

Output layer: 

The output layer, including Analysis III, Special Teaching Methods 2, Elementary Number Theory, Algebra, and 
Problem Solving are used as the output variables (i.e. output nodes). The output layer contains these five nodes. 

The academic achievement prediction was made with the field courses taken at the first years of the faculty for the 
analyses carried out by moving from the grade data of the 292 students. The network pattern of BPNN which has 6 
inputs and 5 output layers used in the study can be seen in figure 1. The information which is given after the figure is as 
in the following; the input and output layers in ANN, the number of examples and the transfer function of the BPNN 
model together with the number of operation elements at secret layer, learning theorem and threshold level which was 
identified for obtaining the expected error value. Mean square error (MSE) value kept learning until it reached the 
threshold level. While doing this, it employed the minimum function. When the expected error value was reached, the 
number of learning was stopped before reaching 10000. 

58 of 292 data were used to test data. The data in the training set and the network were trained and the possibility of 
generalizing the results produced by the network was tested. For the test data, 29 data in our data table were selected as 
Successful ‘1’, and the other 29 data were selected as Unsuccessful ‘O’. The total number of the test data was 58. The 
training of network lasted during 10000 iteration and then it was over. However, if the set of MSE for the verification 
set had increased, the training would have been stopped at that point before 10000 iteration. The regulation of 
significances was carried out after all the samples in the training set had been shown to network. 

Input layer for figure 2: 

The input layer, including General Mathematics, Pure Mathematics, Analysis I, Analysis II, Geometry, Linear Algebra-I, 
Analysis III, Special Teaching Methods 2, Elementary Number Theory, Algebra, Problem Solving are taken as input 
variables (i.e. input nodes) in the input layer of the BPNN. Therefore, the input layer contains a total of eleven nodes. 
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Output layer for figure 2: 

The output layer, including ALES score is used as the output variables (i.e. output nodes). The output layer contains one 
node. The academic achievement prediction was made with the field courses taken at the first years of the faculty for 
the analyses carried out by moving from the grade data of the 140 students. The network pattern of BPNN which had 11 
inputs and 1 output layers used in the study can be seen in figure 2. The information which is given after the figure are 
as in the following; the input and output layers in ANN, the number of examples and the transfer function of the BPNN 
model doing this, it employs the minimum function. When the expected error value was reached, the number of learning 
was stopped before reaching 10000. 

28 of 292 data were used as the test data. The data in the training set and the network were trained and the possibility of 
generalizing the results produced by the network was tested. For the test data, 14 data in our data table were selected as 
Successful ‘1’, and the other 14 data were selected as Unsuccessful ‘O’.together with the number of operation elements 
at secret layer, learning theorem and threshold level which was identified for obtaining the expected error value. Mean 
square error (MSE) value keeps learning until it reaches the threshold level. 

The total number of the test data was 58. The training of network lasted during 10000 iteration and then it was over. 
Flowever, if the set of MSE for the verification set had increased, the training would have been stopped at that point 
before 10000 iteration. The regulation of significances was carried out after all the samples in the training set were 
shown to network. 



Figure 2. Architecture of three-layer BPNN in this study (Neural network 2:11 Input, 1 Output Layered Network 

Model.) 

3.1 Experimental Results 
Results of ANN 1 

In our case, a back-propagation network with three layers seemed to be the most appropriate. The input layer had 6 
neurons and the output layer had 5 neurons and the number of nodes in the hidden layer was 8. The prediction equation 
predicted classification group membership correctly with an average of 77.125% 

Results of ANN 2 

In our case, a back-propagation network with three layers seemed to be the most appropriate. The input layer had 11 
neurons and the output layer had 1 neuron and the number of nodes in the hidden layer was 6. The prediction equation 
predicted classification group membership correctly with an average of 68.5% 
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Table 2. Performance when tested with testing data. 


Performance 

Successful 

Unsuccessful 

MSE 

0.22 

0.21 

NMSE 

2.18 

2.15 

MAE 

0.36 

0.35 

Min Abs Error 

0.00 

0.00 

r 

0.21 

0.21 

Percent Correct 

71.18 

65.82 


After optimizing the network’s structure and training the data within 1000 epochs, we tested the network’s predictive 
power on the data set. The network successfully classified the data as 68.5% 

Results of correlation analysis 

The correlation analysis was made between the average graduation point (GPA) and ALES score components. 

Table 3. The correlation between the average graduation and ALES score 




The 

graduation 

average AL e S score 

The average 

Pearson Correlation 

i 

.422 

graduation 

Sig. (2-tailed) 


.256 


N 

80 

80 

ALES score 

Pearson Correlation 

.422 

1 


Sig. (2-tailed) 

.256 



N 

80 

80 


According to the correlation results; the correlation between the average graduation and ALES score seemed to have 
had no significant correlation between the two data (r = 0.422, p> .05). This analysis was carried out with the students 
who received/were receiving graduate education between the years of 2006 -2013. 

Table 4. Prediction results from two different Neural Networks 


Methods 

Overall Accuracy 

Sum of Mean Square Error 
(MSE) 

Mean Absolute Error 
(MAE) 

Neural Networkl 

77.125 

0.18 

0.30 

Neural Network2 

68.5 

0.21 

0.35 


Recognizing that prediction constitutes an excellent first toward intervention and considering the classification power of 
ANN’s turned into ANN technology to predict successful academic achievement rates. 

4. Discussion 

The research tested which model should be used to predict success the artificial neural networks model to what extent 
succeeded and focused on determining which predictions made with the help of these models had successful outcomes. 
It also showed the value of the ANN. It is possible to determine and guide prospective teachers who are planning to 
have graduate education in accordance with the successful forecasts of prediction methods. 

The identification of the contributory parameters can be conducted by performing sensitivity analysis. Saltelli at. all. 
(2004), Karamouzis & Vrettos (2008) have reached similar conclusions in their studies. Although the difference in error 
values produced by prediction models mentioned in studies was not high, the fact that neural network model was more 
successful than all the other data sets showed us that artificial neural network technique could be an alternative 
technique for classical statistic methods in educational studies depending on prediction. The findings are quite 
consistent with other published research results. For example, Lee (2010), Naik& Rogotman (2009), Ibrahim & Rush 
(2007), Schwan (1988), Amanatiadis, Mitsinis, & Maditinos (2014) 

According to the correlation results; the correlation between the average graduation and ALES score seemed to have 
had no significant correlation between the two data (r = 0.422, p> .05). This analysis was carried out with the students 
who received/were receiving graduate education between the years of 2006 -2013. 

ALES is an exam based on logic and reasoning skills, so majority of prospective mathematics teachers receive high 
marks from this examination. That there was no significant relationship between GPA and ALES scores can be 
explained by high scores of the mathematics teaching students. These results indicate that BPNN is a suitable tool for 
predicting success of the students who are studying in education majors. Student career advisors can use the ANN 
model to identify the students who have particular potential for graduate education, and this prediction model can help 
these students adjust their own teaching strategies, and provide guidance and support for their careers. 

Following studies shoidd identify the student parameters that can be contributory to successful predictability rates. For 
example; students’ genders, family incomes, foreign language knowledge, etc. can be taken into consideration in the 
future studies. 
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